Presto vs Apache Drill - A Factual Comparison
Big Data is a huge challenge for enterprises, and the ability to query data quickly and efficiently is becoming increasingly important. Presto and Apache Drill are two systems that offer new ways of querying data. Both are SQL query engines that offer high speed and flexibility, but they differ in their implementation and capabilities.
What is Presto?
Presto is a distributed SQL query engine designed to query data where it’s stored. It can query data from multiple data sources such as Hadoop, Cassandra, MySQL, and more. Presto has a modular architecture, which separates the query engine from the storage layer, making it easy to add new connectors for data sources.
Presto also has an excellent query optimizer, which optimizes queries by splitting queries into smaller subqueries, which are then sent to individual nodes. Presto allows you to join data across multiple data sources, which is a powerful feature for Big Data analysis.
What is Apache Drill?
Apache Drill is an open-source distributed SQL query engine designed for Big Data. It is similar to Presto in that it can query large data sets from various data sources. It has a modular architecture where query processing, data storage, and connectivity are separated into different modules. Apache Drill can query structured and semi-structured data, including JSON and NoSQL databases.
Apache Drill has a unique feature that allows you to run ad-hoc queries on data stored in Hadoop clusters without requiring any schema or metadata. This allows users to query any data they have on hand and analyze it in real-time.
Comparison
Performance
When it comes to performance, both Presto and Apache Drill are fast and efficient, but Presto has a slight edge. In a benchmark test, Presto ran queries 6-14x faster than Apache Drill on a 1 TB dataset. However, it's important to note that the performance depends on the specific use case, so the numbers may vary.
Query optimization
Presto has a more advanced query optimizer than Apache Drill. This means that it can optimize queries more efficiently, which leads to faster query processing times. However, Apache Drill has a simpler optimizer, which makes it easier to set up and understand.
Data sources
Both Presto and Apache Drill offer support for multiple data sources. However, Presto supports a wider range of data sources, including relational databases, NoSQL databases, and Hadoop file systems.
Query Language
Both Presto and Apache Drill use SQL as their query language. However, Apache Drill takes it a step further by extending SQL to support semi-structured data like JSON.
Conclusion
Both Presto and Apache Drill are great Big Data tools, and each has its strengths and weaknesses. Presto has better query optimization and more extensive data source support, while Apache Drill is more accessible for ad-hoc queries and supporting semi-structured data. The choice between them depends on individual needs, and it’s important to evaluate them both in terms of your specific use case.
References
- Presto documentation. https://prestodb.io/docs/current/
- Apache Drill documentation. https://drill.apache.org/docs/
- Performance benchmarking of Presto and Apache Drill on a 1 TB dataset by Starburst. https://www.starburstdata.com/blog/ presto-performance-benchmark/